Spatiotemporal immune atlas of the first clinical-grade, gene-edited pig-to-human kidney xenotransplant

Downstream Analysis of CD45+ immune cells (General analytics pipeline exemplified herein)

Overview

Our overarching aim in this vignette is to analyse CD45+ immune cells (3’ Gene Expression scRNA-seq) from the 10-GE porcine right xenograft using Seurat.

Analysis summary:

- Counts were processed using the standard Seurat (v4.3.0.1) workflow

- The human-pig CD45+ (hpcd45) dataset consists of 9,085 input single cells 

- Number of cells after filtering: 6,513

- Key filtering thresholds: nFeature_RNA > 200  & nFeature_RNA < 3500 & percent.mt.ss11 < 4 & percent.mt.hg19 < 12 

- Normalization method used: LogNormalize

- Dimensions (dims) used for the uniform manifold approximation and projection (UMAP) algorithm (RunUMAP) and also used for k-nearest neighbor (KNN) graph construction FindNeighbors(): 1:20 considered from PCA analysis

Downstream Analysis

Loading required packages

suppressPackageStartupMessages({
  library(Seurat)
  library(cowplot)
  library(dplyr)
  library(Matrix)
  library(reticulate)
  library(monocle)
  library(WebGestaltR)
  library(harmony)
  library(MAST)
  library(devtools)
  library(ggplot2)
  library(patchwork)
  library(SeuratData) 
  library(SeuratWrappers)
  library(dplyr)
  library(hdf5r)
  library(ape)
  library(Rfast2)
  library(RColorBrewer)
  library(viridis)
  library(data.table)
  library(gridExtra)
  library(purrr)
  library(usefun)
  library(formattable)
  library(splitstackshape)
  library(formatR)
  library(venn)
  library(VennDiagram)
  library(Hmisc)
  library(interp)
  library(knitr)
})

Set working directory

setwd("/Users/Anchor/projects/xeno_manuscript/hpcd45")

We’ll start our analyses by assessing and removing ambient RNA in our dataset before proceeding with further downstream QC and analyses

Removing Ambient RNA Using SoupX

Droplet based single cell RNA sequence analyses assume all acquired RNAs are endogenous to cells. However, any cell free RNAs contained within the input solution are also captured by these assays. This sequencing of cell free RNA constitutes a background contamination that has the potential to confound the correct biological interpretation of single cell transcriptomic data. Contamination from this “soup” of cell free RNAs is ubiquitous, experiment specific in its composition and magnitude, and can lead to erroneous biological conclusions. SoupX is a method used for quantifying the extent of the contamination and estimating “background corrected”, cell expression profiles that can be integrated with existing downstream analysis tools. soupX reduces batch effects, strengthens cell-specific quality control and improves biological interpretation

The method to do this consists of three parts:

  1. Calculate the profile of the soup
  2. Estimate the cell specific contamination fraction
  3. Infer a corrected expression matrix

Various approaches of running soupX to estimate and remove soup contamination have been suggested here:

https://cran.r-project.org/web/packages/SoupX/readme/README.html and here:

https://rawcdn.githack.com/constantAmateur/SoupX/204b602418df12e9fdb4b68775a8b486c6504fe4/inst/doc/pbmcTutorial.html

1. Defining your own clusters

hpcd45.filt.matrix <- Read10X_h5("outs/filtered_feature_bc_matrix.h5",use.names = T)
hpcd45.raw.matrix  <- Read10X_h5("outs/raw_feature_bc_matrix.h5",use.names = T)

str(hpcd45.raw.matrix)
str(hpcd45.filt.matrix)

hpcd45.Seurat.Object  <- CreateSeuratObject(counts = hpcd45.filt.matrix)
hpcd45.Seurat.Object

soup.channel  <- SoupChannel(hpcd45.raw.matrix, hpcd45.filt.matrix)
soup.channel

hpcd45.Seurat.Object[["percent.mt"]] <- PercentageFeatureSet(hpcd45.Seurat.Object, pattern = "^MT-")

hpcd45.Seurat.Object <- NormalizeData(hpcd45.Seurat.Object, normalization.method = "LogNormalize", scale.factor = 10000)
hpcd45.Seurat.Object <- FindVariableFeatures(hpcd45.Seurat.Object, selection.method = "vst", nfeatures = 2000)
all.genes <- rownames(hpcd45.Seurat.Object)
hpcd45.Seurat.Object <- ScaleData(hpcd45.Seurat.Object, features = all.genes)
hpcd45.Seurat.Object <- RunPCA(hpcd45.Seurat.Object, features = VariableFeatures(object = hpcd45.Seurat.Object))
hpcd45.Seurat.Object <- RunUMAP(hpcd45.Seurat.Object, dims = 1:30)
hpcd45.Seurat.Object <- FindNeighbors(hpcd45.Seurat.Object, dims = 1:30)
hpcd45.Seurat.Object <- FindClusters(hpcd45.Seurat.Object)

meta    <- hpcd45.Seurat.Object@meta.data
umap    <- hpcd45.Seurat.Object@reductions$umap@cell.embeddings
soup.channel  <- setClusters(soup.channel, setNames(meta$seurat_clusters, rownames(meta)))
soup.channel  <- setDR(soup.channel, umap)
head(meta)

soup.channel  <- autoEstCont(soup.channel)

head(soup.channel$soupProfile[order(soup.channel$soupProfile$est, decreasing = T), ], n = 20)

adj.matrix  <- adjustCounts(soup.channel, roundToInt = T)

2. Automatic method to estimate the contamination fraction and decontaminate data. Leverages clustering information from cellranger.

sc1 = load10X("outs/")
str(sc1)
sc1 = autoEstCont(sc1)
out1 = adjustCounts(sc1)
dim(out1)
DropletUtils:::write10xCounts("hpcd45_soupX_filtered", out1) #we shall use results from this run

3. Manually loading and decontaminating the data - here I am loading clusters from cellranger analysis

table_of_counts <- Read10X_h5("outs/filtered_feature_bc_matrix.h5",use.names = T)
table_of_droplets  <- Read10X_h5("outs/raw_feature_bc_matrix.h5",use.names = T)

sc2  <- SoupChannel(table_of_droplets, table_of_counts) 
head(sc2$metaData)

cluster_labels <- read.csv("outs/analysis/clustering/graphclust/clusters.csv")

all.equal(rownames(sc2$metaData), cluster_labels$Barcode)

sc2 = setClusters(sc2, cluster_labels$Cluster)
sc2 = autoEstCont(sc2)
out2 = adjustCounts(sc2)

Loading soupX Corrected Data

We’ll load the data using the Read10X() function. The Read10X() function reads in the output of the cellranger pipeline from 10X, returning a unique molecular identified (UMI) count matrix. The values in this matrix represent the number of molecules for each feature (i.e. gene; row) that are detected in each cell (column).

We next use the count matrix to create a Seurat object. The object serves as a container that has both data (like the count matrix) and analysis (like PCA, or clustering results) for a single-cell dataset. For a technical discussion of the Seurat object structure, check Seurat’s GitHub Wiki. For example, the count matrix is stored in hpcd45[["RNA"]]@counts.

#Loading soupX filtered data
hpcd45.soupX.Filtered <- Read10X(data.dir = "soupX_hpcd45_filt")
dim(hpcd45.soupX.Filtered)
## [1] 41918  9085
str(hpcd45.soupX.Filtered)
## Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   ..@ i       : int [1:12631288] 38 197 302 315 336 344 391 402 406 450 ...
##   ..@ p       : int [1:9086] 0 356 3706 4359 5133 8154 11037 12027 12742 13373 ...
##   ..@ Dim     : int [1:2] 41918 9085
##   ..@ Dimnames:List of 2
##   .. ..$ : chr [1:41918] "hg19_OR4F16" "hg19_TNFRSF4" "hg19_TNFRSF18" "hg19_ATAD3B" ...
##   .. ..$ : chr [1:9085] "AAACCCAAGATTCGAA-1" "AAACCCAAGTGCCTCG-1" "AAACCCACAACCGACC-1" "AAACCCACACCTTCGT-1" ...
##   ..@ x       : num [1:12631288] 1 0 2 1 1 1 1 1 1 2 ...
##   ..@ factors : list()
#Initialize the Seurat object with the raw (non-normalized data)
hpcd45 <- CreateSeuratObject(counts = hpcd45.soupX.Filtered, project = "dec1", min.cells = 5, min.features = 200)
hpcd45
## An object of class Seurat 
## 24568 features across 8908 samples within 1 assay 
## Active assay: RNA (24568 features, 0 variable features)
str(hpcd45)
## Formal class 'Seurat' [package "SeuratObject"] with 13 slots
##   ..@ assays      :List of 1
##   .. ..$ RNA:Formal class 'Assay' [package "SeuratObject"] with 8 slots
##   .. .. .. ..@ counts       :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   .. .. .. .. .. ..@ i       : int [1:12601549] 26 122 191 200 209 216 247 254 258 289 ...
##   .. .. .. .. .. ..@ p       : int [1:8909] 0 356 3705 4358 5131 8152 11033 12022 12737 13367 ...
##   .. .. .. .. .. ..@ Dim     : int [1:2] 24568 8908
##   .. .. .. .. .. ..@ Dimnames:List of 2
##   .. .. .. .. .. .. ..$ : chr [1:24568] "hg19-TNFRSF4" "hg19-TNFRSF18" "hg19-ATAD3B" "hg19-THAP3" ...
##   .. .. .. .. .. .. ..$ : chr [1:8908] "AAACCCAAGATTCGAA-1" "AAACCCAAGTGCCTCG-1" "AAACCCACAACCGACC-1" "AAACCCACACCTTCGT-1" ...
##   .. .. .. .. .. ..@ x       : num [1:12601549] 1 0 2 1 1 1 1 1 1 2 ...
##   .. .. .. .. .. ..@ factors : list()
##   .. .. .. ..@ data         :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   .. .. .. .. .. ..@ i       : int [1:12601549] 26 122 191 200 209 216 247 254 258 289 ...
##   .. .. .. .. .. ..@ p       : int [1:8909] 0 356 3705 4358 5131 8152 11033 12022 12737 13367 ...
##   .. .. .. .. .. ..@ Dim     : int [1:2] 24568 8908
##   .. .. .. .. .. ..@ Dimnames:List of 2
##   .. .. .. .. .. .. ..$ : chr [1:24568] "hg19-TNFRSF4" "hg19-TNFRSF18" "hg19-ATAD3B" "hg19-THAP3" ...
##   .. .. .. .. .. .. ..$ : chr [1:8908] "AAACCCAAGATTCGAA-1" "AAACCCAAGTGCCTCG-1" "AAACCCACAACCGACC-1" "AAACCCACACCTTCGT-1" ...
##   .. .. .. .. .. ..@ x       : num [1:12601549] 1 0 2 1 1 1 1 1 1 2 ...
##   .. .. .. .. .. ..@ factors : list()
##   .. .. .. ..@ scale.data   : num[0 , 0 ] 
##   .. .. .. ..@ key          : chr "rna_"
##   .. .. .. ..@ assay.orig   : NULL
##   .. .. .. ..@ var.features : logi(0) 
##   .. .. .. ..@ meta.features:'data.frame':   24568 obs. of  0 variables
##   .. .. .. ..@ misc         : list()
##   ..@ meta.data   :'data.frame': 8908 obs. of  3 variables:
##   .. ..$ orig.ident  : Factor w/ 1 level "dec1": 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..$ nCount_RNA  : num [1:8908] 1058 12717 1305 1250 10585 ...
##   .. ..$ nFeature_RNA: int [1:8908] 347 3291 642 766 2961 2792 971 695 610 557 ...
##   ..@ active.assay: chr "RNA"
##   ..@ active.ident: Factor w/ 1 level "dec1": 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..- attr(*, "names")= chr [1:8908] "AAACCCAAGATTCGAA-1" "AAACCCAAGTGCCTCG-1" "AAACCCACAACCGACC-1" "AAACCCACACCTTCGT-1" ...
##   ..@ graphs      : list()
##   ..@ neighbors   : list()
##   ..@ reductions  : list()
##   ..@ images      : list()
##   ..@ project.name: chr "dec1"
##   ..@ misc        : list()
##   ..@ version     :Classes 'package_version', 'numeric_version'  hidden list of 1
##   .. ..$ : int [1:3] 4 1 3
##   ..@ commands    : list()
##   ..@ tools       : list()

Standard Pre-processing Workflow

The steps below encompass the standard pre-processing workflow for scRNA-seq data in Seurat. These represent the selection and filtration of cells based on QC metrics, data normalization and scaling, and the detection of highly variable features.

Quality Control (QC)

Seurat allows you to easily explore QC metrics and filter cells based on any user-defined criteria. A few QC metrics commonly used by the community include:

  • The number of unique genes detected in each cell.
  • Low-quality cells or empty droplets will often have very few genes
  • Cell doublets or multiplets may exhibit an aberrantly high gene count
  • Similarly, the total number of molecules detected within a cell (correlates strongly with unique genes)
  • The percentage of reads that map to the mitochondrial genome
  • Low-quality or dying cells often exhibit extensive mitochondrial contamination
  • We calculate mitochondrial QC metrics with the PercentageFeatureSet() function, which calculates the percentage of counts originating from a set of features
  • We use the set of all genes starting with MT- as a set of mitochondrial genes
# Calculating proportion of transcripts mapping to mitochondrial genes

# Filtering on human and pig mitocondrial genes as separate entities
# First we'll store mitochondrial percentage for both pig and human in the meta data object and filter on these respectively
# mitochondrial ratio: this metric will give us a percentage of cell reads originating from 
# the mitochondrial genes Seurat has a convenient function that allows us to calculate the 
# proportion of transcripts mapping to mitochondrial genes. The PercentageFeatureSet() 
# will take a pattern and search the gene identifiers. For each column (cell) it 
# will take the sum of the counts slot for features belonging to the set, divide by the 
# column sum for all features and multiply by 100

#We'll store the percentage of reads that map to the mitochondrial genome in the metadata object as "percent.mt" for each specie as gene names are encoded/are different
(mito_genes_human <- rownames(hpcd45)[grep("MT-", rownames(hpcd45))])
##  [1] "hg19-MT-ND1"  "hg19-MT-ND2"  "hg19-MT-CO1"  "hg19-MT-CO2"  "hg19-MT-ATP8"
##  [6] "hg19-MT-ATP6" "hg19-MT-CO3"  "hg19-MT-ND3"  "hg19-MT-ND4L" "hg19-MT-ND4" 
## [11] "hg19-MT-ND5"  "hg19-MT-ND6"  "hg19-MT-CYB"
hpcd45[["percent.mt.ss11"]] <- PercentageFeatureSet(hpcd45, features = c("ss11-ND1", "ss11-ND2", "ss11-COX1", "ss11-COX2", "ss11-ATP8", "ss11-ATP6", "ss11-COX3", "ss11-ND3", "ss11-ND4L", "ss11-ND4", "ss11-ND5", "ss11-ND6", "ss11-CYTB"))

hpcd45[["percent.mt.hg19"]] <- PercentageFeatureSet(hpcd45, features = c("hg19-MT-ND1", "hg19-MT-ND2",  "hg19-MT-CO1", "hg19-MT-CO2",  "hg19-MT-ATP8", "hg19-MT-ATP6", "hg19-MT-CO3",  "hg19-MT-ND3",  "hg19-MT-ND4L", "hg19-MT-ND4" , "hg19-MT-ND5",  "hg19-MT-ND6" , "hg19-MT-CYB" )) 

#The number of unique genes and total molecules are automatically calculated during `CreateSeuratObject()` and we can find these stored in the object meta data as nFeature_RNA and nCount_RNA respecitvely.
hpcd45@meta.data %>% 
  head(n=5)
##                    orig.ident nCount_RNA nFeature_RNA percent.mt.ss11
## AAACCCAAGATTCGAA-1       dec1       1058          347            0.00
## AAACCCAAGTGCCTCG-1       dec1      12717         3291            0.00
## AAACCCACAACCGACC-1       dec1       1305          642            0.00
## AAACCCACACCTTCGT-1       dec1       1250          766            4.32
## AAACCCACACTAGGCC-1       dec1      10585         2961            0.00
##                    percent.mt.hg19
## AAACCCAAGATTCGAA-1       0.2835539
## AAACCCAAGTGCCTCG-1       4.0418338
## AAACCCACAACCGACC-1       1.3026820
## AAACCCACACCTTCGT-1       0.0000000
## AAACCCACACTAGGCC-1       2.9570146

Feature plots before QC

Visualize QC metrics, and leverage plots to filter cells

p1 <- VlnPlot(hpcd45, features = c("nFeature_RNA"), ncol = 1, cols= "skyblue") + theme_light(base_size = 14) + theme(legend.position = "none", plot.title = element_text(size = 14, face = "bold", hjust = 0.5))
p2 <- VlnPlot(hpcd45, features = c("nCount_RNA"), ncol = 1, cols= "skyblue") + theme_light(base_size = 14) + theme(legend.position = "none", plot.title = element_text(size = 14, face = "bold", hjust = 0.5))
p3 <- VlnPlot(hpcd45, features = c("percent.mt.ss11"), ncol = 1, cols= "skyblue") + theme_light(base_size = 14) + theme(legend.position = "none", plot.title = element_text(size = 14, face = "bold", hjust = 0.5))
p4 <- VlnPlot(hpcd45, features = c("percent.mt.hg19"), ncol = 1, cols= "skyblue") + theme_light(base_size = 14) + theme(legend.position = "none", plot.title = element_text(size = 14, face = "bold", hjust = 0.5))
grid.arrange(p1, p2, p3, p4, ncol=4)

# FeatureScatter is typically used to visualize feature-feature relationships, but can be used
# for anything calculated by the object, i.e. columns in object metadata, PC scores etc.

plot1 <- FeatureScatter(hpcd45, feature1 = "nCount_RNA", feature2 = "percent.mt.ss11", cols= "gray50")
plot2 <- FeatureScatter(hpcd45, feature1 = "nCount_RNA", feature2 = "percent.mt.hg19", cols= "gray50")
plot3 <- FeatureScatter(hpcd45, feature1 = "nCount_RNA", feature2 = "nFeature_RNA", cols= "gray50")
plot1 + plot2 + plot3

Shown above are scatter plots of features (typically feature expression), across a set of single cells. Pearson correlation between the features is displayed above the plot.

More exploration on the data and distributions

df <- as.data.table(hpcd45@meta.data)
sel <- c("orig.ident", "nCount_RNA", "nFeature_RNA", "percent.mt.ss11", "percent.mt.hg19")
df <- df[, sel, with = FALSE]
df[1:4, ]
##    orig.ident nCount_RNA nFeature_RNA percent.mt.ss11 percent.mt.hg19
## 1:       dec1       1058          347            0.00       0.2835539
## 2:       dec1      12717         3291            0.00       4.0418338
## 3:       dec1       1305          642            0.00       1.3026820
## 4:       dec1       1250          766            4.32       0.0000000
fontsize <- 10
linesize <- 0.35

gp.ls <- df[, 2:5] %>% imap( ~ {
  
  give.n <- function(x) {
    return(c(y = median(x) + max(x) / 10, label = round(median(x), 2)))
  }
  
  col.ls <-
    setNames(
      c('gray40', 'gray50', 'gray70', 'gray90', "gray" ),
      c("nCount_RNA", "nFeature_RNA", "percent.mt.ss11", "percent.mt.hg19", "log10GenesPerUMI")
    )
  
  ggplot(data = df, aes(x = orig.ident, y = .x)) +
    geom_violin(trim = FALSE, fill = col.ls[.y]) +
    ggtitle(label = .y) + ylab(label = .y) +
    theme_bw() +
    theme(
      panel.grid.major = element_blank(),
      panel.grid.minor = element_blank(),
      strip.background = element_blank(),
      panel.border = element_blank()
    ) +
    theme(
      axis.text = element_text(size = fontsize),
      axis.line = element_line(colour = "black", size = linesize),
      axis.ticks = element_line(size = linesize),
      axis.title.x = element_blank(),
      axis.ticks.length = unit(.05, "cm"),
      plot.title = element_text(size = fontsize + 2, hjust = 0.5),
      legend.position = 'none'
    ) +
    stat_summary(fun = median, geom = "point", col = "black") + 
    stat_summary(fun.data = give.n,
                 geom = "text",
                 col = "black") + theme_light()
})
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
grid.arrange(gp.ls[[1]], gp.ls[[2]], gp.ls[[3]],gp.ls[[4]], ncol = 4)

Percent MT Distribution - Human and Pig

#Density plot
p1 <- hpcd45@meta.data %>% 
  ggplot(aes(x = hpcd45@meta.data$percent.mt.hg19)) +
  geom_density() + scale_color_manual(values = c("blue")) + 
  theme_classic() +
  geom_vline(aes(xintercept = mean(hpcd45@meta.data$percent.mt.hg19)),
             color="blue", linetype="dashed", size = 0.5) + ggtitle("Human") + 
  theme(plot.title = element_text(hjust=0.5, face="bold")) +
  scale_x_continuous(breaks = seq(0, 50, by = 1))  
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
p2 <- hpcd45@meta.data %>% 
  ggplot(aes(x = hpcd45@meta.data$percent.mt.ss11)) +
  geom_density() + scale_color_manual(values = c("blue")) + 
  theme_classic() +
  geom_vline(aes(xintercept = mean(hpcd45@meta.data$percent.mt.ss11)),
             color="blue", linetype="dashed", size = 0.5) + ggtitle("Human") + 
  theme(plot.title = element_text(hjust=0.5, face="bold")) + 
  scale_x_continuous(breaks = seq(0, 50, by = 1)) + ggtitle("Pig") 


grid.arrange(p1, p2, nrow=2)

Number of cell counts per sample before filtering

metadata <- hpcd45@meta.data
# Add cell IDs to metadata
metadata$cells <- rownames(metadata)

# Rename columns
metadata <- metadata %>%
  dplyr::rename(nUMI = nCount_RNA,
                nGene = nFeature_RNA)

unique(metadata$orig.ident)
## [1] dec1
## Levels: dec1
# Visualize the number of cell counts per sample
metadata %>% 
  ggplot(aes(x=orig.ident, fill=orig.ident)) + 
  geom_bar(color = "gray80", fill = "gray80") +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
  theme(plot.title = element_text(hjust=0.5, face="bold")) +
  ggtitle("NCells") + theme(legend.position = "none") + 
  theme(legend.position = "none") + 
  geom_text(stat='count', aes(label=..count..), vjust = 0.5)
## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.

Number UMIs/transcripts per cell

#Visualize the number UMIs/transcripts per cell
ggplot(metadata, aes(x = nUMI)) + 
  geom_histogram(aes(y = ..density..),
                 alpha = 0.3, color="gray50", fill="white") +
  scale_x_log10() + 
  theme_classic() +
  ylab("Cell density/UMI counts per cell") +
  geom_vline(xintercept = 500) + theme(legend.position = "none")+
  geom_density(lwd = 0.5, colour = 4,
               fill = 4, alpha = 0.1)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#The UMI counts per cell should generally be above 500, that is the low end of what we expect. If UMI counts are between 500-1000 counts, it is usable but the cells probably should have been sequenced more deeply

More on Data and QC

counts <- Matrix(hpcd45@assays$RNA@counts)
counts_per_cell <- Matrix::colSums(counts)
counts_per_gene <- Matrix::rowSums(counts)
genes_per_cell1 <- Matrix::colSums(counts>0) #count a gene only if it has non-zero reads mapped.
cells_per_gene <- Matrix::rowSums(counts>0) #only count cells where the gene is expressed

counts_per_cell <- as.data.frame(colSums(counts))
counts_per_gene <- as.data.frame(rowSums(counts))
genes_per_cell <- as.data.frame(colSums(counts>0)) 
cells_per_gene <- as.data.frame(rowSums(counts>0) )

colnames(counts_per_cell) <- "counts"
colnames(counts_per_gene) <- "counts"
colnames(genes_per_cell) <- "genes_per_cell"
colnames(cells_per_gene) <- "cells_per_gene"

df <- cbind(counts_per_cell, genes_per_cell)

ggplot(df, aes(x=counts, y=genes_per_cell)) + geom_point(color="gray30") + scale_y_continuous(trans='log10') + scale_x_continuous(trans='log10') + theme_light()

#Plot cells ranked by their number of detected genes.
genes_per_cell$cells <- rownames(genes_per_cell)

#The upper and lower limit curve bends ~give a good clue on what thresholds to set:
ggplot(genes_per_cell, aes(x=reorder(genes_per_cell, cells), y=genes_per_cell)) + geom_point() + 
  scale_y_continuous(trans='log10', breaks=seq(0, 5000, by = 1000)) + ggtitle("Genes per Cell") + theme_test(base_size = 12) + 
  labs(x= "Cells", y="Number of Genes") + 
  theme(axis.text.x = element_blank(), axis.ticks.x = element_blank(), plot.title = element_text(size = 14, face = "bold", hjust = 0.5))  

Data Filtering

(Count93_nCount_RNA <- quantile(hpcd45@meta.data$nCount_RNA, 0.93)) # calculate value in the 93rd percentile for a hint on thresholds but these should be taken with a grain of salt, look at the above plots as well to determine thresholds
##      93% 
## 12311.06
(Count93_nFeature_RNA <- quantile(hpcd45@meta.data$nFeature_RNA, 0.93))
##  93% 
## 3145
(Count93_percent.mt.ss11 <-  quantile(hpcd45@meta.data$percent.mt.ss11, 0.93))
##      93% 
## 3.815247
(Count93_percent.mt.hg19 <-  quantile(hpcd45@meta.data$percent.mt.hg19, 0.93))
##      93% 
## 11.08643
summary(hpcd45@meta.data$nCount_RNA)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     500    1330    2142    4301    6223   36405
summary(hpcd45@meta.data$nFeature_RNA)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     201     630     906    1384    2068    5913
hpcd45 <- subset(hpcd45, subset = nFeature_RNA > 200 & nFeature_RNA < 3500 & percent.mt.ss11 < 4 & percent.mt.hg19 < 12)

VlnPlot(hpcd45, features = c("nFeature_RNA", "nCount_RNA", "percent.mt.ss11", "percent.mt.hg19"), ncol = 2, pt.size = 0.1, cols= "skyblue")

Data after filtering

metadata <- hpcd45@meta.data
# Add cell IDs to metadata
metadata$cells <- rownames(metadata)

# Rename columns
metadata <- metadata %>%
  dplyr::rename(nUMI = nCount_RNA,
                nGene = nFeature_RNA)

unique(metadata$orig.ident)
## [1] dec1
## Levels: dec1
# Visualize the number of cell counts per sample
metadata %>% 
  ggplot(aes(x=orig.ident, fill=orig.ident)) + 
  geom_bar(color = "gray80", fill = "gray80") +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
  theme(plot.title = element_text(hjust=0.5, face="bold")) +
  ggtitle("NCells") + theme(legend.position = "none") + 
  theme(legend.position = "none") + 
  geom_text(stat='count', aes(label=..count..), vjust = 0.5)

Doublet Removal

Detection of doublets was conducted in python using scrublet and a file containing scrublet calls/predictions was written out. Doublet predictions were then loaded into R and used as a basis for filtering doublets out.

Visualization of the doublet predictions in a 2-D embedding/UAMP: Predicted doublets should mostly co-localize (possibly in multiple clusters). If they do not, you may need to adjust the doublet score threshold, or change the pre-processing parameters to better resolve the cell states present in your data.

As a good check, the simulated doublet histogram below should typically be bimodal. The left mode corresponds to “embedded” doublets generated by two cells with similar gene expression. The right mode corresponds to “neotypic” doublets, which are generated by cells with distinct gene expression (e.g., different cell types) and are expected to introduce more artifacts in downstream analyses. Scrublet can only detect neotypic doublets. This histogram is an important diagnostic plot. Doublet score threshold should separate the two shoulders of the bimodal distribution as shown below:

#Loading scrublet predictions
dim(scrublet_calls <- read.csv("scrublet_predictions/scrublet_calls.csv")) 
## [1] 9085    3
table(scrublet_calls$predicted_doublets)# 303 doublets found
## 
## False  True 
##  8782   303
dim(scrublet_calls <- scrublet_calls[which(scrublet_calls$X %in% rownames(hpcd45@meta.data)),])
## [1] 7490    3
rownames(scrublet_calls) <- scrublet_calls$X
scrublet_calls$X <-NULL
dim(scrublet_calls)
## [1] 7490    2
#Adding doublet information to metadata
#First we'll ensure that the rownames in hpcd45 match the rownames in scrublet_calls. AddMetaData maps rownames but we'll still do so to ensure that mapping of predictions are made to respective bar codes
scrublet_calls <- scrublet_calls[rownames(hpcd45@meta.data), ]
head(rownames(scrublet_calls))
## [1] "AAACCCAAGATTCGAA-1" "AAACCCAAGTGCCTCG-1" "AAACCCACAACCGACC-1"
## [4] "AAACCCACACTAGGCC-1" "AAACCCACACTTCAAG-1" "AAACCCAGTTACACTG-1"
head(rownames(hpcd45@meta.data))
## [1] "AAACCCAAGATTCGAA-1" "AAACCCAAGTGCCTCG-1" "AAACCCACAACCGACC-1"
## [4] "AAACCCACACTAGGCC-1" "AAACCCACACTTCAAG-1" "AAACCCAGTTACACTG-1"
hpcd45 <- AddMetaData(hpcd45, scrublet_calls)

#Without normalizing the data, we want to first identify the doublets in our datasets
hpcd45_2 <- hpcd45
hpcd45_2 <- FindVariableFeatures(hpcd45_2, selection.method = "vst", nfeatures = 2500)
hpcd45_2 <- ScaleData(object = hpcd45_2, scale.max = 30,  verbose = FALSE)
hpcd45_2 <- RunPCA(object = hpcd45_2, npcs = 30, verbose = FALSE)
hpcd45_2 <- FindNeighbors(hpcd45_2, dims = 1:20, verbose = TRUE, reduction = "pca")
## Computing nearest neighbor graph
## Computing SNN
hpcd45_2 <- RunUMAP(hpcd45_2, dims = 1:20, verbose = TRUE, reduction = "pca")
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
## 10:15:33 UMAP embedding parameters a = 0.9922 b = 1.112
## 10:15:33 Read 7490 rows and found 20 numeric columns
## 10:15:33 Using Annoy for neighbor search, n_neighbors = 30
## 10:15:33 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 10:15:34 Writing NN index file to temp file /var/folders/f_/kwrjzcys6250cp1plhxyg21h0000gn/T//Rtmp1CSpnV/file1c195ee14d55
## 10:15:34 Searching Annoy index using 1 thread, search_k = 3000
## 10:15:36 Annoy recall = 99.93%
## 10:15:36 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 10:15:37 Initializing from normalized Laplacian + noise (using irlba)
## 10:15:37 Commencing optimization for 500 epochs, with 345110 positive edges
## 10:15:50 Optimization finished
hpcd45_2 <- FindClusters(hpcd45_2, verbose = TRUE, reduction = "pca") #Resolution can be adjusted 
## Warning: The following arguments are not used: reduction
## Warning: The following arguments are not used: reduction
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 7490
## Number of edges: 257933
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8702
## Number of communities: 19
## Elapsed time: 0 seconds
FeaturePlot(hpcd45_2, features = "doublet_scores", pt.size = 0.01)

DimPlot(hpcd45_2, group.by = "predicted_doublets", pt.size = 0.01, cols = c("gray90", "firebrick3"))

#Checking the nUMI for doublets and singlets
VlnPlot(hpcd45_2,
        features = "nCount_RNA",
        pt.size = 0,
        group.by = "predicted_doublets") + NoLegend()

#Fractions of doublets per cluster
df <- data.table(hpcd45_2@meta.data)

perc <- as.data.frame(df %>% 
                        group_by(seurat_clusters, predicted_doublets) %>%
                        dplyr::summarise(cnt = n()) %>%
                        mutate(freq = formattable::percent(cnt / sum(cnt), digits = 5)))
## `summarise()` has grouped output by 'seurat_clusters'. You can override using
## the `.groups` argument.
perc$predicted_doublets <- as.character(perc$predicted_doublets)
perc$predicted_doublets[perc$predicted_doublets == "True"] <- "Doublet"
perc$predicted_doublets[perc$predicted_doublets == "False"] <- "Singlet"
perc %>% 
  ggplot() +
  geom_bar(aes(x = seurat_clusters, y=freq,
               group = predicted_doublets,
               fill = predicted_doublets),
           stat = "identity", width = 0.99, alpha = 0.8) +
  theme_test()+ 
  labs(y=paste0("% Distribution of doublets and singlets per cluster"), x="") +
  scale_fill_manual(values = c("Doublet" = 'red3', "Singlet" = "gray80")) +
  theme(legend.position = "right") +scale_y_continuous(expand = c(0,0))

#Next we'll remove the doublets and see what the data looks like
hpcd45_2 <- hpcd45_2[, hpcd45_2@meta.data[, "predicted_doublets"] == "False"]
unique(hpcd45_2@meta.data$predicted_doublets)
## [1] "False"
DimPlot(hpcd45_2, group.by = "predicted_doublets", pt.size = 0.01, cols = c("gray90", "firebrick3"), label = TRUE)

VlnPlot(hpcd45_2, features = c("nFeature_RNA", "nCount_RNA", "percent.mt.ss11", "percent.mt.hg19"), ncol = 2, pt.size = 0)

Filtering cells to remove doublets

hpcd45 <- hpcd45[, hpcd45@meta.data[, "predicted_doublets"] == "False"]
unique(hpcd45@meta.data$predicted_doublets)
## [1] "False"

Normalization

After removing unwanted cells from the dataset, the next step is to normalize the data. Based on our July 29 meeting, it was decided on that we’ll apply LogNormalize to normalize our dataset as this method better represents the underlying biology of this data. The “LogNormalize” method normalizes the feature expression measurements for each cell by the total expression, multiplies this by a scale factor (10,000 by default), and log-transforms the result. Normalized values are stored in hpcd45[[“RNA”]]@data.

hpcd45 <- NormalizeData(hpcd45, normalization.method = "LogNormalize", scale.factor = 10000) 

#Identification of highly variable features (feature selection)
hpcd45 <- FindVariableFeatures(hpcd45, selection.method = "vst", nfeatures = 3000)

#Identify the 10 most highly variable genes
top10 <- head(VariableFeatures(hpcd45), 10)

#Plot variable features with and without labels
plot1 <- VariableFeaturePlot(hpcd45)
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE)
## When using repel, set xnudge and ynudge to 0 for optimal results

Scaling the Data and Performing Linear Dimensional Reduction

Next we perform PCA on the scaled data. By default, only the previously determined variable features are used as input, but can be defined using features argument if you wish to choose a different subset.

all.genes.hpcd45 <- rownames(hpcd45)
hpcd45 <- ScaleData(object = hpcd45, scale.max = 30,  verbose = FALSE) 
hpcd45 <- RunPCA(object = hpcd45, npcs = 30, verbose = FALSE) #performing PCA on the scaled data

Seurat provides several useful ways of visualizing both cells and features that define the PCA, including VizDimReduction(), DimPlot(), and DimHeatmap()

# Examine and visualize PCA results a few different ways
print(hpcd45[["pca"]], dims = 1:5, nfeatures = 5)
## PC_ 1 
## Positive:  hg19-MT-CO3, hg19-MT-ND4, hg19-MT-CO1, hg19-MT-ATP6, hg19-MT-CYB 
## Negative:  ss11-ENSSSCG00000012119, ss11-B2M, ss11-ENSSSCG00000033262, ss11-CFL1, ss11-RPS23 
## PC_ 2 
## Positive:  ss11-CD14, ss11-ENSSSCG00000030088, ss11-F13A1, ss11-VSIG4, ss11-MSR1 
## Negative:  ss11-ETS1, ss11-SCML4, ss11-CST7, ss11-CD2, ss11-MED12L 
## PC_ 3 
## Positive:  ss11-ADGRG1, ss11-HPGD, hg19-ANXA2, hg19-CTSB, hg19-RPLP0 
## Negative:  ss11-MSR1, ss11-PKIB, ss11-ENSSSCG00000006350, ss11-JAML, ss11-MMP9 
## PC_ 4 
## Positive:  ss11-HPGD, ss11-ADGRG1, ss11-CD300C, ss11-PLPP1, ss11-ENSSSCG00000013788 
## Negative:  hg19-CTSB, hg19-RPLP0, hg19-ANXA2, hg19-RPL3, hg19-S100A10 
## PC_ 5 
## Positive:  ss11-RNF128, ss11-PLTP, ss11-DNASE1L3, ss11-PMEPA1, ss11-NAPSA 
## Negative:  ss11-S100A8, ss11-ENSSSCG00000006588, ss11-PPBP, ss11-S100A12, ss11-ENSSSCG00000026043
VizDimLoadings(hpcd45, dims = 1:2, reduction = "pca")

In particular DimHeatmap() allows for easy exploration of the primary sources of heterogeneity in a dataset, and can be useful when trying to decide which PCs to include for further downstream analyses. Both cells and features are ordered according to their PCA scores. Setting cells to a number plots the ‘extreme’ cells on both ends of the spectrum, which dramatically speeds plotting for large datasets. Though clearly in a supervised analysis, we find this to be a valuable tool for exploring correlated feature sets.

DimHeatmap(hpcd45, dims = 1, cells = 500, balanced = TRUE)

DimHeatmap(hpcd45, dims = 1:24, cells = 500, balanced = TRUE)

Determine the Dimensionality of the Dataset

To overcome the extensive technical noise in any single feature for scRNA-seq data, Seurat clusters cells based on their PCA scores, with each PC essentially representing a ‘metafeature’ that combines information across a correlated feature set. The top principal components therefore represent a robust compression of the dataset. However, how many components should we choose to include? 10? 20? 100?

We shall use the ‘Elbow plot’ (ElbowPlot() function): a ranking of principle components based on the percentage of variance explained by each one.

ElbowPlot(hpcd45, ndims = 30) #determining dimentionality of dataset

Identifying the true dimensionality of a dataset can be challenging/uncertain. It is therefore recommended to consider these three approaches as well: The first is more supervised, exploring PCs to determine relevant sources of heterogeneity, and could be used in conjunction with GSEA for example. The second implements a statistical test based on a random null model, but is time-consuming for large datasets, and may not return a clear PC cutoff. The third is a heuristic that is commonly used, and can be calculated instantly.

Alternative method to determine number of principal components

#Determine percent of variation associated with each PC
pct <- hpcd45[["pca"]]@stdev / sum(hpcd45[["pca"]]@stdev) * 100

#Calculate cumulative percents for each PC
cumulative_percentage <- cumsum(pct)

#Determine which PC exhibits cumulative percent greater than 90% and % variation associated with the PC as less than 5
pcs.perc <- which(cumulative_percentage > 90 & pct < 5)[1]
pcs.perc
## [1] 25
#Determine the difference between variation of PC and subsequent PC
var.pcs <- sort(which((pct[1:length(pct) - 1] - pct[2:length(pct)]) > 0.1), decreasing = T)[1] + 1

#last point where change of % of variation is more than 0.1%.
var.pcs
## [1] 12
#Minimum of the two calculation
pcs <- min(pcs.perc, var.pcs)
pcs
## [1] 12
#Create a dataframe with values
plot_df <- data.frame(pct = pct, 
           cumulative_percentage = cumulative_percentage, 
           rank = 1:length(pct))

#Elbow plot to visualize 
  ggplot(plot_df, aes(cumulative_percentage, pct, label = rank, color = rank > pcs)) + 
  geom_text() + 
  geom_vline(xintercept = 90, color = "blue", linetype="dashed", size=0.5) + 
  geom_hline(yintercept = min(pct[pct > 5]), color = "blue", linetype="dashed", size=0.5) +
  theme_light() + scale_colour_discrete(l = 40)

However, we still see some degree of variance explained by pcs 16 and 20, so we’ll consider pcs 1-20

Cell Clustering

Seurat v3 or higher applies a graph-based clustering approach, building upon initial strategies in (Macosko et al). Importantly, the distance metric which drives the clustering analysis (based on previously identified PCs) remains the same. However, the approach used to partition the cellular distance matrix into clusters has dramatically improved. The approach was heavily inspired by recent manuscripts which applied graph-based clustering approaches to scRNA-seq data [SNN-Cliq, Xu and Su, Bioinformatics, 2015] and CyTOF data [PhenoGraph, Levine et al., Cell, 2015]. Briefly, these methods embed cells in a graph structure - for example a K-nearest neighbor (KNN) graph, with edges drawn between cells with similar feature expression patterns, and then attempt to partition a graph into highly interconnected ‘quasi-cliques’ or ‘communities’.

First we construct a KNN graph based on the euclidean distance in PCA space, and refine the edge weights between any two cells based on the shared overlap in their local neighborhoods (Jaccard similarity). This step is performed using the FindNeighbors() function, and takes as input the previously defined dimensionality of the dataset (first n PCs that have been chosen).

To cluster the cells, modularity optimization techniques are applied such as the Louvain algorithm (default) or SLM [SLM, Blondel et al., Journal of Statistical Mechanics], to iteratively group cells together, with the goal of optimizing the standard modularity function. The FindClusters() function implements this procedure, and contains a resolution parameter that sets the ‘granularity’ of the downstream clustering, with increased values leading to a greater number of clusters. Setting this parameter between 0.4-1.2 typically returns good results for single-cell datasets of around 3K cells. Optimal resolution often increases for larger datasets. The clusters can be found using the Idents() function.

hpcd45 <- FindNeighbors(hpcd45, dims = 1:20, verbose = TRUE, reduction = "pca")
## Computing nearest neighbor graph
## Computing SNN
hpcd45 <- FindClusters(hpcd45, verbose = TRUE, resolution = 0.8, reduction = "pca") 
## Warning: The following arguments are not used: reduction

## Warning: The following arguments are not used: reduction
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 7266
## Number of edges: 251812
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8182
## Number of communities: 17
## Elapsed time: 0 seconds
# Look at cluster IDs of the first 5 cells
head(Idents(hpcd45), 5)
## AAACCCAAGATTCGAA-1 AAACCCAAGTGCCTCG-1 AAACCCACAACCGACC-1 AAACCCACACTAGGCC-1 
##                  0                  3                  0                  3 
## AAACCCACACTTCAAG-1 
##                  5 
## Levels: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Run Non-Linear Dimensional Reduction

Seurat offers several non-linear dimensional reduction techniques, such as tSNE and UMAP, to visualize and explore these datasets. The goal of these algorithms is to learn the underlying manifold of the data in order to place similar cells together in low-dimensional space. Cells within the graph-based clusters determined above should co-localize on these dimension reduction plots. As input to the UMAP and tSNE, it is suggested to use the same PCs as input to the clustering analysis.

hpcd45 <- RunUMAP(hpcd45, dims = 1:20, verbose = TRUE, reduction = "pca")
## 10:16:09 UMAP embedding parameters a = 0.9922 b = 1.112
## 10:16:09 Read 7266 rows and found 20 numeric columns
## 10:16:09 Using Annoy for neighbor search, n_neighbors = 30
## 10:16:09 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 10:16:09 Writing NN index file to temp file /var/folders/f_/kwrjzcys6250cp1plhxyg21h0000gn/T//Rtmp1CSpnV/file1c1961090228
## 10:16:09 Searching Annoy index using 1 thread, search_k = 3000
## 10:16:11 Annoy recall = 100%
## 10:16:11 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 10:16:12 Initializing from normalized Laplacian + noise (using irlba)
## 10:16:12 Commencing optimization for 500 epochs, with 330034 positive edges
## 10:16:24 Optimization finished
#Visualize UMAP
DimPlot(object = hpcd45, reduction = "umap", label = TRUE, label.size = 6 )

table(Idents(hpcd45))
## 
##    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15 
## 1863 1005  940  605  578  515  503  264  253  178  158  144  117   69   42   17 
##   16 
##   15

Note change in UMAP orientation due to change in computing environment

We can save the filtered and clean object at this point for further downstream analyses

saveRDS(hpcd45, file = "hpcd45_cleaned_object.rds")

Finding Differentially Expressed Features (Cluster Biomarkers)

Seurat can help us find markers that define clusters via differential expression. By default, it identifies positive and negative markers of a single cluster (specified in ident.1), compared to all other cells. FindAllMarkers() automates this process for all clusters, but you can also test groups of clusters vs. each other, or against all cells.

The min.pct argument requires a feature to be detected at a minimum percentage in either of the two groups of cells, and the thresh.test argument requires a feature to be differentially expressed (on average) by some amount between the two groups. You can set both of these to 0, but with a dramatic increase in time - since this will test a large number of features that are unlikely to be highly discriminatory. As another option to speed up these computations, max.cells.per.ident can be set. This will downsample each identity class to have no more cells than whatever this is set to. While there is generally going to be a loss in power, the speed increases can be significant and the most highly differentially expressed features will likely still rise to the top.

Note on key parameters passed when running FindMarkers(): min.pct: To speed up runs, only test genes that are detected in a minimum fraction of min.pct cells in either of the two populations by setting min.pct = user-defined value (Default is 0.1). Meant to speed up the function by not testing genes that are very infrequently expressed. If we set min.pct = 0, this means that we are interested in all genes however this takes quite long.

only.pos: Only return positive markers: by default this value is FALSE. Similarly you could set only.pos = FALSE to return both positive and negative markers

logfc.threshold: Limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups of cells. Default is 0.25. Increasing logfc.threshold speeds up the function, but can miss weaker signals. Setting logfc.threshold = 0 will return all expressed genes in case you need to look at the entire list - note that this is also computationally intensive and some runs can typically take an entire day depending on the number of cells in a dataset.

(clusters <- c(0, seq(1:14))) 

#No need to rerun this as it takes a while at min.pct = 0, logfc.threshold = 0 (to recover even weak signals)
for(i in clusters){
  cluster.markers <- FindMarkers(hpcd45, ident.1 = i, min.pct = 0.25, logfc.threshold = 0.25, only.pos = F)
  cluster.markers <- cluster.markers %>% arrange(desc(avg_log2FC))
  write.csv(cluster.markers, file=paste0("Markers/All_Cluster", i, "_Markers.csv"))
}
sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
##  [1] grid      splines   stats4    stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] knitr_1.42                    interp_1.1-3                 
##  [3] Hmisc_4.8-0                   Formula_1.2-4                
##  [5] survival_3.5-3                lattice_0.20-45              
##  [7] VennDiagram_1.7.3             futile.logger_1.4.3          
##  [9] venn_1.11                     formatR_1.14                 
## [11] splitstackshape_1.4.8         formattable_0.2.1            
## [13] usefun_0.4.8                  purrr_1.0.1                  
## [15] gridExtra_2.3                 data.table_1.14.8            
## [17] viridis_0.6.2                 viridisLite_0.4.1            
## [19] RColorBrewer_1.1-3            Rfast2_0.1.4                 
## [21] ape_5.7                       hdf5r_1.3.8                  
## [23] SeuratWrappers_0.3.0          stxBrain.SeuratData_0.1.1    
## [25] pbmcsca.SeuratData_3.0.0      pbmcMultiome.SeuratData_0.1.2
## [27] ifnb.SeuratData_3.1.0         SeuratData_0.2.1             
## [29] patchwork_1.1.2               devtools_2.4.5               
## [31] usethis_2.1.6.9000            MAST_1.22.0                  
## [33] SingleCellExperiment_1.18.1   SummarizedExperiment_1.26.1  
## [35] GenomicRanges_1.48.0          GenomeInfoDb_1.32.4          
## [37] IRanges_2.30.1                S4Vectors_0.34.0             
## [39] MatrixGenerics_1.8.1          matrixStats_0.63.0           
## [41] harmony_0.1.1                 Rcpp_1.0.10                  
## [43] WebGestaltR_0.4.5             monocle_2.24.1               
## [45] DDRTree_0.1.5                 irlba_2.3.5.1                
## [47] VGAM_1.1-7                    ggplot2_3.4.1                
## [49] Biobase_2.56.0                BiocGenerics_0.42.0          
## [51] reticulate_1.28               Matrix_1.5-3                 
## [53] dplyr_1.1.0                   cowplot_1.1.1                
## [55] SeuratObject_4.1.3            Seurat_4.3.0.1               
## 
## loaded via a namespace (and not attached):
##   [1] rappdirs_0.3.3         scattermore_0.8        R.methodsS3_1.8.2     
##   [4] tidyr_1.3.0            bit64_4.0.5            DelayedArray_0.22.0   
##   [7] R.utils_2.12.2         rpart_4.1.19           RCurl_1.98-1.10       
##  [10] doParallel_1.0.17      generics_0.1.3         lambda.r_1.2.4        
##  [13] callr_3.7.3            leidenbase_0.1.14      RANN_2.6.1            
##  [16] combinat_0.0-8         future_1.31.0          bit_4.0.5             
##  [19] tzdb_0.3.0             spatstat.data_3.0-0    httpuv_1.6.9          
##  [22] xfun_0.37              hms_1.1.2              jquerylib_0.1.4       
##  [25] evaluate_0.20          promises_1.2.0.1       fansi_1.0.4           
##  [28] igraph_1.4.0           DBI_1.1.3              htmlwidgets_1.6.1     
##  [31] sparsesvd_0.2-2        apcluster_1.4.10       spatstat.geom_3.0-6   
##  [34] ellipsis_0.3.2         backports_1.4.1        bookdown_0.32         
##  [37] deldir_1.0-6           vctrs_0.5.2            remotes_2.4.2         
##  [40] ROCR_1.0-11            abind_1.4-5            cachem_1.0.6          
##  [43] withr_2.5.0            progressr_0.13.0       checkmate_2.1.0       
##  [46] sctransform_0.3.5      prettyunits_1.1.1      goftest_1.2-3         
##  [49] svglite_2.1.1          cluster_2.1.4          lazyeval_0.2.2        
##  [52] crayon_1.5.2           spatstat.explore_3.0-6 labeling_0.4.2        
##  [55] pkgconfig_2.0.3        slam_0.1-50            vipor_0.4.5           
##  [58] nlme_3.1-162           pkgload_1.3.2          nnet_7.3-18           
##  [61] rlang_1.0.6            globals_0.16.2         lifecycle_1.0.3       
##  [64] miniUI_0.1.1.1         rsvd_1.0.5             ggrastr_1.0.1         
##  [67] polyclip_1.10-4        lmtest_0.9-40          rngtools_1.5.2        
##  [70] zoo_1.8-11             beeswarm_0.4.0         base64enc_0.1-3       
##  [73] whisker_0.4.1          ggridges_0.5.4         processx_3.8.0        
##  [76] pheatmap_1.0.12        png_0.1-8              bitops_1.0-7          
##  [79] R.oo_1.25.0            KernSmooth_2.23-20     doRNG_1.8.6           
##  [82] stringr_1.5.0          parallelly_1.34.0      spatstat.random_3.1-3 
##  [85] jpeg_0.1-10            readr_2.1.4            scales_1.2.1          
##  [88] memoise_2.0.1          magrittr_2.0.3         plyr_1.8.8            
##  [91] ica_1.0-3              zlibbioc_1.42.0        compiler_4.2.1        
##  [94] HSMMSingleCell_1.16.0  fitdistrplus_1.1-8     cli_3.6.0             
##  [97] XVector_0.36.0         urlchecker_1.0.1       listenv_0.9.0         
## [100] pbapply_1.7-0          ps_1.7.2               htmlTable_2.4.1       
## [103] MASS_7.3-58.2          tidyselect_1.2.0       stringi_1.7.12        
## [106] highr_0.10             yaml_2.3.7             latticeExtra_0.6-30   
## [109] ggrepel_0.9.3          sass_0.4.5             tools_4.2.1           
## [112] future.apply_1.10.0    parallel_4.2.1         rstudioapi_0.14       
## [115] foreign_0.8-84         foreach_1.5.2          farver_2.1.1          
## [118] rmdformats_1.0.4       Rtsne_0.16             RcppZiggurat_0.1.6    
## [121] digest_0.6.31          BiocManager_1.30.19    shiny_1.7.4           
## [124] qlcMatrix_0.9.7        later_1.3.0            RcppAnnoy_0.0.20      
## [127] httr_1.4.4             colorspace_2.1-0       fs_1.6.1              
## [130] tensor_1.5             uwot_0.1.14            spatstat.utils_3.0-1  
## [133] sp_1.6-0               plotly_4.10.1          sessioninfo_1.2.2     
## [136] systemfonts_1.0.4      xtable_1.8-4           futile.options_1.0.1  
## [139] jsonlite_1.8.4         Rfast_2.0.7            R6_2.5.1              
## [142] profvis_0.3.7          pillar_1.8.1           htmltools_0.5.4       
## [145] mime_0.12              glue_1.6.2             fastmap_1.1.0         
## [148] codetools_0.2-19       pkgbuild_1.4.0         utf8_1.2.3            
## [151] bslib_0.4.2            spatstat.sparse_3.0-0  tibble_3.1.8          
## [154] ggbeeswarm_0.7.1       leiden_0.4.3           admisc_0.30           
## [157] limma_3.52.4           rmarkdown_2.20         docopt_0.7.1          
## [160] fastICA_1.2-3          munsell_0.5.0          GenomeInfoDbData_1.2.8
## [163] iterators_1.0.14       reshape2_1.4.4         gtable_0.3.1